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Abstract 

We have translated fractional Brownian motion (FBM) signals into a 
text based on two "letters", as if the signal fluctuations correspond to 
a constant stepsize random walk. We have applied the Zipf method to 
extract the (' exponent relating the word frequency and its rank on a log- 
log plot. We have studied the variation of the Zipf exponent(s) giving the 
relationship between the frequency of occurrence of words of length m < 8 
made of such two letters: (' is varying as a power law in terms of m. We 
have also searched how the exponent of the Zipf law is influenced by a 
linear trend and the resulting effect of its slope. We can distinguish finite 
size effects, and results depending whether the starting FBM is persistent 
or not, i.e. depending on the FBM Hurst exponent H. It seems then 
numerically proven that the Zipf exponent of a persistent signal is more 
influenced by the trend than that of an antipersistent signal. It appears 
that the conjectured law = \2H — 1| only holds near H = 0.5. We have 
also introduced considerations based on the notion of a Ume dependent 
Zipf law along the signal. 
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1 Introduction 

Many phenomena which contain discernible events which can be counted can 
be ranked according to their frequency, and a so called Zipf plot can be drawn. 

H, ^, ^, 1^ Very often a quasi linear relationship is found on a log- log plot. The 
slope corresponds to an exponent s describing the frequency P of the cumulative 
occurrence of the events according to their rank R through, e.g. P{> R) ~ i?^". 
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Such a (Zipf) power law is found in many cases, with s ~ 1: see the 
distribution of income of individuals or companies in countries (Pareto distri- 
bution) ^ ^, in economy with the size of companie s P, p^ , in earthquakes 
(Gutenberg- Richter law) 0, |l2), in city distribution etc. 
p^ , pO| [2l| , [2^... The Zipf law universal feature is thought to originate from 
stochastic processes |23| , in particular when they can be modeled as random 
walks in a log scale [ ^4[ , - though it is still often said in a lay language that the 
law is a description (or a result ?) of uniformity and diversity. 

A simple extension of the Zipf analysis is to consider m-letter words, i.e. 
the words strictly made of m characters without considering the white spaces. 
The available number of different letters or characters k in the alphabet should 
also be specified. A power law for the word frequency f{R) is expected to be 
observed @ |o| 

/ - R-^- (1) 

The Zipf exponent can be estimated through the derivative of the best linear 
fit on a log-log graph. There is no theory at this time predicting whether the 
exponent is a function of (m, fc). Elsewhere we had already shown that the 
Zipf exponent could be different, call it ^' if the frequency / of occurrence is 
normalized with respect to the theoretical frequency /' which should be that 
expected for pure, or unbiased, (stochastic) Brownian processes, thus 

///' ^ R-^'- (2) 

E.g. suppose a binary alphabet, i.e. made up of two characters, u, d (or 
and 1 in electronics) with occurrence p„ and pd] the theoretical frequency 
/' for the number n of characters, say of the type c? in a word of length m is 
f _ 

J — Pu Pd ■ 

Previous Zipf-like analyses have often neglected the possibility that the (u, d) 
distribution could be biased, over a finite size time interval, thus a finite ten- 
dency might exist, - and in fact could have a quite varied structure, as when in 
finance a trend is considered to be mimicked by some moving average It 
seems clear that if a tendency is positive the number of positive volatility val- 
ues is larger than the number of negative ones (and conversely) |^ . Therefore 
the Zipf law exponent might be affected because of some bias in the ranking. 
Whether or not the exponent depends on the bias is briefly examined here. 

The final goal of this series of investigations is to apply the idea in the study 
of financial temporal series, or generally translating a time series into a sentence, 
a particular letter corresponding to a particular variation of a signal. This is in 
line with previous studies in econophysics e.g. in order to search whether some 
investment strategy can be derived in particular from a Zipf law observation. 
This would fall into the same type of studies as those implying the detrended 
fiuctuation analysis (DFA). 



2 



Here we analyze time series based on a one dimensional fractional Brownian 
motion characterized by the so called Hurst exponent H. |26, 



Why such a series? Because the FBM is not a Markovian process since its 
value at a given time depends on all past points, whence we consider that it can 
be a useful model for modeling financial data time series. In fact, Peters ||2^, ^ 
has shown that a FBM is a good model for describing returns (but it does not 
work for options) in financial series. 



2 Data 

In order to develop a FBM we have used Rambaldia and Pinazza algorithm |^ . 
According to the latter a FBM can be obtained from 

j 

BhU) = '^i^j-t+i^t, (3) 

where ^ represents the "walker" position during a time interval At; ^ is a random 
variable to be extracted from a Gaussian distribution with zero mean and a given 
variance. Thereby the signal is a stochastic one, with a diffusion growing with 
time as 2jJ.]3l|| The weight function ujj^i^i is given by 

^.-.+1 = 77^[(^' - ' + ^)"^^ U ')"^^]' (4) 

^ + 2 

where 7 is such that < Bjj{l) >=1. Ifi7 = l/2 one has the usual Brownian 
motion. The signal is said to be persistent for H > 1/2, and antipersistent 
otherwise. There has been some conjecture ^ , sometimes thought to be 
proven like a theorem that 

C' = C=|2i/-l|. (5) 

We have created six different FBM signals with H values respectively equal 
to 0.17, 0.41, 0.47, 0.60, 0.67, 0.82. The series have a 16 384 ( = 2^^) length. 
They are normalized such that 

Bnit) — 7- . (6) 

The coefficient 1.1 in the denominator and numerator is used in order to 
avoid zero and unity values. The series are shown in Fig.l. Their characteristic 
is summarized in Table 1. We have recalculated the Hurst exponent |2^ by the 
box counting method. ||2^ The error bar is given in Table 1, as AH. The error 
bars are those resulting from a root mean square analysis. The linear trend has 
been measured and is reported to be of the order of 10~^, obviously due to the 
finite size of the system. 
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In order to apply the Zipf method and extract the ^' exponent relating the 
word frequency and its rank, we have translated the FBM signals into a text 
based on two letters, u and d, occurring with a frequency pu and pd respectively. 
The bias defined as e = p„ — 0.5 has been measured. The bias and also the signal 
tendency have been observed as a function of time but are not shown here for 
lack of space. They decay quickly, can be positive or negative and are of the 
order or smaller than 1% after 4000 time steps. 

The partial distribution functions (pdf ) of the logarithm of the signal volatil- 
ity , i.e. 

Z{t)^Hy{t + At))-Hy{t)) (7) 

have been fitted to Gauss and stretched exponential distributions. Instead of 
ranking the Z(t) values in constant size box histograms, one can as suggested 
by Adamic [|40| use binned histograms with bin size increasing exponentially, 
thereby obtaining other best fit parameters. The stretched exponential distri- 
bution seems to well describe the signal fluctuations. A Kolomogorov-Smirnov 
test has also been made. Notice that even though the KS distance increases 
when the exponential size box is used, this scheme reduces the uncertainty values 
and is found to lead to better fits. 

Another test of the stochasticity (or not) of the data is based on the surro- 
gate data method |4^ in which one randomizes either the sign of the fluctuations 
or shuffles their amplitude and Fourier transform the resulting signal in order to 
observe whether a white noise signal is so obtained. Finally we have observed 
whether the error bars (or confidence intervals) of the raw signal and the sur- 
rogate data signal (not shown here) overlap. The characteristic of the spectral 
functions S{f) ~ /"'^ so obtained are available from the authors if necessary. 
The results allow us to conclude that the above FBM signals are satisfactory 
for further treatment in presence of a to be pre-imposed bias. 

Notice that similar histograms of such "words" were already published Q 
for (m = 3, fc = 2) and (m = 5, fc = 3) respectively, but the authors were more 
interested in deviations from randomness than in the Zipf exponent. 



3 Zipf Analysis of Fractional Brownian Motion 
Raw Data 

Consider the (to, 2) Zipf method, thus for an only two character alphabet, and 
words of arbitrary length m. Therefore there are 2™ different possible words. 
We searched whether these words exist in the series of Fig.l, counted them and 
ranked them in decreasing order, as shown in Fig. 2, for 2 < to < 8. The C 
(and C, also but not shown) exponents seem to increase with to (Fig. 3). The 
result may be attributed to the finite size of the series, if one realizes that for 
such series the number of long words necessarily occurs more rarely than the 
number of short ones. 
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As a test of finite size effects, we have successively removed one by one 
the less frequently occurring words, and recalculated the C' (and () exponents, 
in some sense taking a rank — > zero limit, or first derivative. The exponents 
resulting from the average of the latter values are shown in Fig. 4. It is found 
that for large H, i.e. a persistent signal, the exponents are rather constant, but 
still markedly vary for the antipersistent signals. 

As a further step, we can also consider whether there is a Zipf law evolution, 
i.e. search for the C exponent evolutions, as a function of time t or as a function 
of the number of points in the series. We have calculated values of C and ( for 
the (first) box containing 100 points, then for a box containing 200, 300, ... etc 
points up to 16 000. The evolution (Fig. 5) is rather drastic for the first, say, 
6000 points but is moderately stable thereafter. 

Next recall the so called local (or better instantaneous) DFA method. 
[ pT[ |35| , |3^ , |39|| In order to probe the existence of correlated and/or 

decorrelated sequences, a so-called observation box of finite size can be con- 
structed, and is placed at the beginning of the data. A DFA is performed on 
the data contained in that box. The box is then moved along the historical time 
axis by a few points toward the right along the sequence. Iterating this pro- 
cedure for the sequence, a "local measurement" of the "degree of correlations" 
is obtained, i.e. a local measure of the Hurst exponent in the DFA case. The 
results indicate that the H exponent value varies with time. This is similar in 
finance to what is observed along DNA sequences in biology where the H 
exponent drops below 1/2 in so-called non-coding regions. Doing the same here, 
we obtain a local Zipf law and local Zipf exponent. To show a full list of figures 
or data as a function of m G [2,8] would generate a quite aversive stimulus in 
the reader, - therefore only the case m =5 is illustrated here. This value is so 
chosen within the financial idea background having motivated this study, i.e. 
m = 5 is the (true) length of a week ! 

Results for the three FBM signals, with H close to 0.5, like in financial time 
series signals, are illustrated in Figs. 6-8, considering windows (boxes) of size 
250, 500 and 1000 respectively moved along the signal. These values correspond 
to 1, 2 and 4 year type investment window in finance. Notice that the local 
exponents are usually larger than the corresponding average one, in some sense 
corroborating the previous finite size effect analysis results. In turns it seems 
that the method is also of interest in order to observe short range correlation in 
fractional Brownian motions, and non ergodic properties of finite size series. 
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4 Zipf Analysis of Fractional Brownian Motion 
+ Linear Trend Data 

Next consider a FBM on which a linear trend with amphtude is additionally 
superposed (and is equal to zero at the origin), i.e. we add 

yLit) = ALi (8) 

to Eq. (4). We have taken values of Al ranging from 2"-^^ to 2"'^ depending 
on the FBM considered. The case k=2 is only considered at this stage, again 
as a first step and having in mind scales used in other works in econophysics 
and electronics. The Zipf plots are shown in Fig. 9. For small H a sharp drop is 
still seen at large R values. The amplitude (and trend) effect is hardly noticed 
at small H, even with a large trend, while the linearity (on a log-log plot) is 
interestingly observed for larger H. 

The effect of the trend when m is larger than 2, thus up to m=8, as in the 
preceding section is illustrated in Fig. 10. As should be expected the exponent 
C' is rather stable (= Co ) ^ small trend (or slope) A^ and small m, but 
markedly increases when m increases. The C stability is observed apparently 
below some sort of crossover amplitude AL,cr{fn), ... depending on m. The 
variation above such a crossover amplitude follows a power law ((^' ~ A^j^ ) 
which has been determined. The values of the exponent and of the power law 
variation on both sides of the AL,cr are given in Table 2, with their error bar 
A9l. 

Wc should point out that there seems to be some difference whether the FBM 
signal is persistent or not. For antipersistent signals, (,' is quasi Aj^ independent. 
The Brownian motion is clearly an in between case. 

5 Conclusions 

We have considered fractional Brownian motions on which a linear trend is su- 
perimposed. We have translated the signals each into a text based of two letters 
u and d, (or and 1 bits) according to the fluctuations in the corresponding 
random walk. We have studied the variation of the Zipf exponent(s) giving the 
relationship between the frequency of occurrence of words of length m < 8 made 
of such "letters" for a binary alphabet. We have searched how the C' exponent 
of the Zipf law is influenced by the trend and its amplitude. We can distinguish 
finite size effects, and results depending whether the starting FBM is persistent 
or not, depending on the Hurst exponent. It seems that C' varies as a power law 
in terms of m. This seems to be due to the fact that due to the trend a marked 
bias occurs in the relative fluctuations, thus in the word occurrences. It seems 
proven, even though it might have been expected so, that a persistent signal 
short range correlations as analyzed through the Zipf method is more influenced 
by a trend than an antipersistent signal. 
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In the spirit of the local DFA, we have introduced considerations based on the 
effect of finite sizes of texts, and on the notion of a local (or "time" dependent) 
Zipf law, in order to show their influence on the exponent values. 

Finally, coming back to the Czirok et al. conjecture,!^, ^ it seems of 
interest to display the relationship for both studied classes of cases, i.e. (i) the 
raw FBM, and (ii) the FBM + linear trend. The results are shown in Figs. 11- 
12. Even within considerations taking into account finite size effects and a poor 
man statistical analysis, it appears that such a law only holds near H = 0.5. 

Other considerations are in order showing that many other cases can still 
be considered: first one could wonder more about signal stationarity effects. 
Next, either a non linear (thus like a power law or a moving average) trend or a 
periodic background could be superposed on the raw signal. Also multiplicative 
rather than additive trends could be used. This should be put in line with the 
remarkable studies of Hu et al. |Q on DFA but is a work an order of magnitude 
higher here because there are mk parameters to consider. Applications of the 
above to financial data will be presented elsewhere |Q . 
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Figure Captions 

Figure 1 Six fractional Brownian motions studied in the text with char- 
acteristics summarized in Table 1 characterized by an H exponent 

Figure 2 - (m,2) Zipf plots of the six FBM for 2 < m < 8 

Figure 3 - Evolution of the exponent with m for different H values 

Figure 4 Evolution of the C exponent calculated when successively re- 
moving one by one the less frequently occurring words as a function of m for 
different H values 

Figure 5 - Zipf law C exponent evolution as a function of time t starting 
from the (first) box containing 100 points, then for a box containing 200, 300, 
... etc. points up to 16 000 

Figure 6 - Time dependence of the local Zipf exponent in a box of size T 
displaced along the FBM with H= 0.41; three size boxes are illustrated : 205, 
500 and 1000 

Figure 7 - Time dependence of the local Zipf exponent in a box of size T 
displaced along the FBM with H= 0.47; three size boxes are illustrated : 205, 
500 and 1000 

Figure 8 Time dependence of the local Zipf exponent in a box of size T 
displaced along the FBM with H= 0.60; three size boxes are illustrated : 205, 
500 and 1000 

Figure 9 (m,2) Zipf plots of the six FBM-|-linear trend for 2 < m < 8 
with slope Al of the linear trend given in insert 

Figure 10 - Variation of the exponent as a function of the slope trend 
when m is larger than 2 and less than 8 

Figure 11 - "Verification" of the relationship C = |2F- 1| for the six FBM 

Figure 12 - "Verification" of the relationship C = |2-fr - 1| for the six FBM 
-|- linear trend for different trends 
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Table 1: Characteristics of the six raw fractional Brownian motions (FBM) 
hereby studied (Fig.l) : Hurst exponent H, error on H through box counting 
method, trend and bias of the FBM 



H 


0.17 


0.41 


0.47 


0.60 


0.67 


0.82 


AH 


0.02 


0.01 


0.02 


0.01 


0.01 


0.02 


trend(10-^) 


1.09 


3.03 


-3.2 


2.48 


4.51 


-6.22 


bias 


0.0084 


-0.0055 


-0.00575 


-0.004 


0.0143 


0.0038 





m 


4 


5 




6 




7 




8 


l.H r^O.U 




0.2311 


0.2336 





2524 





2753 





2982 


2.H ~ 0.41 


a 

Ol 

A0L 


0.1005 

1.3470 
0.0008 


0.1001 

1.3325 
0.0013 




1 



1179 

2884 
0006 




1 



1304 

1926 
0001 




1 



1445 
1412 
0004 


3.H ~ 0.47 


a 

Ol 

A0L 


0.0164 
1.0816 
0.0014 


0.0231 
1.1053 
0.0027 




1 




0307 
0817 
0017 




1 




0456 
0773 
0016 




1 




0725 
0257 
0004 


4:.H ~ 0.60 


Ol 
AOl 


0.2743 
0.7392 
0.0012 


0.2773 
0.7220 
0.0015 







2783 
7162 
0015 







2816 
7095 
0015 







2854 
6943 
0012 


5.H ~ 0.67 


Ol 

A0L 


0.4658 
0.7939 
0.0054 


0.4554 
0.7653 
0.0039 








4501 

7252 
0025 








4541 

6824 
0010 








4617 

6381 
0010 


6.H ~ 0.82 


Co 
Ol 
A0L 


0.5381 
0.8886 
0.0108 


0.5215 
0.8457 
0.0094 







5098 
8048 
0077 







5056 
7636 
0060 







5083 
7213 
0047 



Table 2: Values of the exponent Co and of its power law variation 6*/, as a function 
of the slope of the trend on both sides of the Al^ct 
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Figure 1: Six fractional Browniaii motions studied in the text with characteris- 
tics summarized in Table 1 characterized by an H exponent 
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Figure 3: Evolution of the C,' exponent with m for different H values 
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Figure 4: Evolution of the ^' exponent calculated when successively removing 
one by one the less frequently occurring words as a function of m for different 
H values 
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Figure 5: Zipf law C' exponent evolution as a function of time t starting from 
the (first) box containing 100 points, then for a box containing 200, 300, ... etc. 
points up to 16 000 
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Figure 6: Time dependence of the local Zipf exponent in a box of size T displaced 
along the FBM with H= 0.41; three size boxes are illustrated : 205, 500 and 
1000 
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Figure 7: Time dependence of the local Zipf exponent in a box of size T displaced 
along the FBM with H— 0.47; three size boxes are illustrated : 205, 500 and 
1000 
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Figure 8: Time dependence of the local Zipf exponent in a box of size T displaced 
along the FBM with H— 0.60; three size boxes are illustrated : 205, 500 and 
1000 



20 




21 





— ; m=4 








(a) H~0.17 


^ ; m=5 

^ ; tii=6 

^ ; m=7 


10° 


(b) H-0.41 






' : ' m=5 












10"' 




= ^'[11=8 












10 ^ 10 




'"lO"* 10"^ 10"^ 
A. 




^^^^^^^^ 


— ; m=5 
^ ■; m=6 
^ ; m=7 


10° 


(d) H-0-60 


— t m=5 
^ t m=7 
■ f m=4 













10 10 10 10 10" 10 10 10 10 10 



Figure 10: Variation of the exponent as a function of the slope trend when 
m is larger than 2 and less than 8 
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